1. The Data Set

1.1. Procuring the data

In order to study the ways in which our country’s weather has behaved in the past years, we decided to look for a data set on different sources from the Internet. Unfortunately, we have not found any data set containing the metrics we need, therefore we decided to use the Visual Crossing Weather API, which would give us the weather conditions in Romania in a set of locations and time periods we chose to specify.

The period of time has been set to a daily basis from Jan 1st 2011, until Dec 31st 2021. This choice has been made due to financial limitations we encountered on the API’s part. In case that we had wanted to procure more data, more requests would have been necessary to be made to the public API, exceeding the threshold of free requests.

For each request, the API will return a single .csv file, respecting the name convention weather_YEAR_COUNTY.csv. So, for 41 Romanian counties and 11 years, we would obtain 451 csv files. However, we need to work with a single data set, so we decided to merge all of them into a single one in the code block below.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# TRUE only at first when "weather_2011-2021_Romania.csv" is not present in our project
MERGE_ALL_DATASETS = FALSE

if (MERGE_ALL_DATASETS) {
  
  filenames_list = list.files(
    path = "../weather_data/",
    pattern = "*.csv",
    full.names = TRUE
  )
  
  data = lapply(filenames_list, read_csv) %>% bind_rows()
  
  write.csv(data, "weather_2011-2021_Romania.csv", row.names = FALSE, na = "")
} 

data = read_csv("weather_2011-2021_Romania.csv", show_col_types = FALSE)
rm(MERGE_ALL_DATASETS)

1.2. The data set’s columns

TODO: Describe the meaning of each relevant column using this documentation

2. Cleaning the Data Set

The data set we obtained is mostly in a good shape. However, on a quick scan, we identified a few aspects that required some attention before diving into the exploratory analysis (e.g. splitting complex columns into multiple simpler ones).

In order to add value to the data set, we also decided to do some feature engineering, and create additional columns that would enhance the diversity of the data types. This diversity will allow for creating new types of plots and open multiple questions with possibly more surprising insights in the Romanian weather.

We start by removing blank spaces from column names for easier writing of the R code (e.g. “Minimum Temperature” becomes “MinimumTemperature”) and redundant columns ResolvedAddress and Name.

names(data) = gsub(' ', '', names(data))
data = data %>% select(-c("ResolvedAddress", "Name"))

2.1. Splitting the Weather Type & Conditions columns

The data set contains two columns named Weather Type and Conditions that contain strings of different weather conditions identified at a specific time and location, each condition being separated by a comma. However, there can be multiple conditions reported in a day, so we decided that, instead of having a single column describing all of the conditions, we could create a separate column for each weather type and condition.

As a result, all of these new columns will be of boolean type, describing whether the weather has been or not identified in the given condition. There are 37 possible weather types and 4 possible conditions in the data set, therefore 41 new columns will be created.

accumulate_weather_func = function(accumulated_types, weather) {
  if (!is.na(weather)) {
    conditions = strsplit(weather, ", ")[[1]]
    accumulated_types = union(accumulated_types, conditions)
  }
  
  return(accumulated_types)
}

# Compute the list of possible weather types in this data set
weather_types = reduce(data$WeatherType, accumulate_weather_func, .init = c())

cat("Number of possible weather types:", length(weather_types))
## Number of possible weather types: 37
# Create the weather type columns
for (weather_type in weather_types) {
  column_name = gsub(' ', '', weather_type)  # remove spaces
  column_name = gsub('/', 'Or', column_name)  # replace slashes with Or
  
  data[[column_name]] = with(
    data, 
    # If Weather is NA, then new column will contain NA as well
    if_else(
      is.na(WeatherType),
      NA,
      # If Weather is defined, check that condition is present => set TRUE/FALSE
      if_else(
        grepl(weather_type, WeatherType, fixed = TRUE),
        TRUE,
        FALSE
      )
    )
  )
}

rm(column_name, weather_type)

# Display an example of some weather columns
data %>%
  slice(42:46) %>%
  select(c(WeatherType, Fog, LightRain, SnowShowers, Duststorm))
## # A tibble: 5 x 5
##   WeatherType                              Fog   LightRain SnowShowers Duststorm
##   <chr>                                    <lgl> <lgl>     <lgl>       <lgl>    
## 1 Mist, Fog, Light Rain                    TRUE  TRUE      FALSE       FALSE    
## 2 Snow Showers, Mist, Rain, Rain Showers,~ FALSE TRUE      TRUE        FALSE    
## 3 Mist                                     FALSE FALSE     FALSE       FALSE    
## 4 <NA>                                     NA    NA        NA          NA       
## 5 Mist                                     FALSE FALSE     FALSE       FALSE
# Compute the list of possible conditions in this data set
conditions = reduce(data$Conditions, accumulate_weather_func, .init = c())

cat("Number of possible conditions:", length(conditions))
## Number of possible conditions: 4
# Create the conditions columns
for (condition in conditions) {
  column_name = paste0("Condition", condition)
  
  if (condition == "Partially cloudy") {
    column_name = "ConditionPartiallyCloudy"
  }
  
  data[[column_name]] = with(
    data, 
    # If Condition is NA, then new column will contain NA as well
    if_else(
      is.na(Conditions),
      NA,
      # If Condition is defined, check that condition is present => set TRUE/FALSE
      if_else(
        grepl(condition, Conditions, fixed = TRUE),
        TRUE,
        FALSE
      )
    )
  )
}

rm(column_name, condition)

# Display an example of the conditions columns
data %>%
  slice(11:15) %>%
  select(c(Conditions, ConditionClear, ConditionPartiallyCloudy, ConditionOvercast, ConditionRain))
## # A tibble: 5 x 5
##   Conditions      ConditionClear ConditionPartia~ ConditionOverca~ ConditionRain
##   <chr>           <lgl>          <lgl>            <lgl>            <lgl>        
## 1 Clear           TRUE           FALSE            FALSE            FALSE        
## 2 Partially clou~ FALSE          TRUE             FALSE            FALSE        
## 3 Overcast        FALSE          FALSE            TRUE             FALSE        
## 4 Overcast        FALSE          FALSE            TRUE             FALSE        
## 5 Rain, Overcast  FALSE          FALSE            TRUE             TRUE

2.2. Categorizing the Wind Direction

As the data is currently offered, the wind direction is expressed as a real number, whose value ranges between 0 and 360 degrees. We can express this value under a categorical form, by assigning a named direction for each specific division on the trigonometric circle. In this way, we can obtain 16 different categories for 16 subdivisions, as can be checked in the figure below.

wind_direction_table = data.frame(
  "Cardinal_Point" = c("N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"),
  "From" = c(348.75, seq(11.25, 326.25, 22.5)),
  "To" = seq(11.25, 348.75, 22.5)
)

Table for Wind Cardinal Directions and Degrees

Using the above table, we can take each value from the WindDirection column and build the new column WindCardinalDirection.

# Break down the numeric direction into more categories
data = data %>%
  mutate(
    WindCardinalDirection = cut(
      x = WindDirection,
      breaks = c(0, wind_direction_table$To, 360),
      labels = c(wind_direction_table$Cardinal_Point, "N")
    )
  )

# Display an example of some wind directions
data %>%
  slice_sample(n=5) %>%
  select(c(WindDirection, WindCardinalDirection))
## # A tibble: 5 x 2
##   WindDirection WindCardinalDirection
##           <dbl> <fct>                
## 1         267.  W                    
## 2         153.  SSE                  
## 3         114.  ESE                  
## 4         128.  SE                   
## 5          54.2 NE

2.3. Categorizing the Visibility

The data set presents a column called Visiblity that represents the distance, measured in kilometers, at which an object or light can be clearly discerned.

As mentioned by the National Weather Service at this location, we can break down these distances into 4 sets, creating a new column called VisibilityLevel, which describes an ordered categorical variable.

The 4 divisions are described as being the following:

  • Very Poor - Less than 0.5 nautical miles (< 0.926 Km)
  • Poor - 0.5 to less than 2 nautical miles (0.926 Km - 3.704 Km)
  • Moderate - 2 to 5 nautical miles (3.704 Km - 9.26 Km)
  • Good - Greater than 5 nautical miles (> 9.26 Km)
data = data %>%
  mutate(
    VisibilityLevel = cut(
      x = Visibility,
      breaks = c(0, 0.926, 3.704, 9.26, +Inf),
      labels = c("Very Poor", "Poor", "Moderate", "Good"),
      include.lowest = TRUE
    )
  )
  
# Display an example of some visibility levels
data %>%
  slice_sample(n=5) %>%
  select(c(Visibility, VisibilityLevel))
## # A tibble: 5 x 2
##   Visibility VisibilityLevel
##        <dbl> <fct>          
## 1        9.9 Good           
## 2       10   Good           
## 3        3.3 Poor           
## 4        8.2 Moderate       
## 5       18.4 Good

2.4. Clean the Counties

2.4.1. Rewriting the County names

The column Address contains the name of each county seat in our country, followed by , Romania, but we would like to have a column called County that contains only the name of the county (not the county seat) to use while plotting the Romania’s map. It is important that the county names computed here to be the same as the ones employed in the geoJson file containing data about drawing the counties’ borders (e.g. must rename Bucharest to Bucuresti).

In addition to this, we also create a new column Mnemonic which can be used in plotting interactive maps.

romania_counties = c("Alba","Arad","Arges","Bacau","Bihor","Bistrita-Nasaud","Botosani","Brasov","Braila","Bucuresti","Buzau","Caras-Severin","Calarasi","Cluj","Constanta","Covasna","Dambovita","Dolj","Galati","Giurgiu","Gorj","Harghita","Hunedoara","Ialomita","Iasi","Maramures","Mehedinti","Mures","Neamt","Olt","Prahova","Satu Mare","Salaj","Sibiu","Suceava","Teleorman","Timis","Tulcea","Vaslui","Valcea","Vrancea")
romania_county_seats = c("Alba Iulia","Arad","Pitesti","Bacau","Oradea","Bistrita","Botosani","Brasov","Braila","Bucuresti","Buzau","Resita","Calarasi","Cluj-Napoca","Constanta","Sfantu Gheorghe","Targoviste","Craiova","Galati","Giurgiu","Targu Jiu","Miercurea Ciuc","Deva","Slobozia","Iasi","Baia Mare","Drobeta-Turnu Severin","Targu Mures","Piatra Neamt","Slatina","Ploiesti","Satu Mare","Zalau","Sibiu","Suceava","Alexandria","Timisoara","Tulcea","Vaslui","Ramnicu Valcea","Focsani")
romania_mnemonics = c("AB","AR","AG","BC","BH","BN","BT","BV","BR","B","BZ","CS","CL","CJ","CT","CV","DB","DJ","GL","GR","GJ","HR","HD","IL","IS","MM","MH","MS","NT","OT","PH","SM","SJ","SB","SV","TR","TM","TL","VS","VL","VN")
# First create a column CountySeat, translate the special Bucharest case,
# and then, based on the above lists, create the final column County
data = data %>%
  mutate(CountySeat = gsub(', Romania', '', Address)) %>%
  mutate(CountySeat = ifelse(CountySeat == "Bihor", "Oradea", CountySeat)) %>%
  mutate(
    CountySeat = ifelse(
      CountySeat == "Bucharest",
      "Bucuresti",
      CountySeat
    )
  ) %>%
  mutate(
    County = plyr::mapvalues(
      x = CountySeat,
      from = romania_county_seats,
      to = romania_counties
    ),
    Mnemonic = plyr::mapvalues(
      x = CountySeat,
      from = romania_county_seats,
      to = romania_mnemonics
    )
  )

# Display an example of several counties
data %>%
  slice_sample(n=5) %>%
  select(c(Address, CountySeat, County, Mnemonic))
## # A tibble: 5 x 4
##   Address                  CountySeat      County    Mnemonic
##   <chr>                    <chr>           <chr>     <chr>   
## 1 Sfantu Gheorghe, Romania Sfantu Gheorghe Covasna   CV      
## 2 Zalau, Romania           Zalau           Salaj     SJ      
## 3 Bacau, Romania           Bacau           Bacau     BC      
## 4 Deva, Romania            Deva            Hunedoara HD      
## 5 Ploiesti, Romania        Ploiesti        Prahova   PH

2.4.2. Complete Ilfov’s weather

We were not sure what the API will return when calling it to get Ilfov’s weather (might have returned Bucharest’s weather, which is inside it), so we decided to pass this county’s API calls in order to allow for other free calls for other counties.

In this project we will suppose that Ilfov and Bucharest share the same weather.

ilfov_data = data %>%
  filter(County == "Bucuresti") %>%
  mutate(
    Address = "Ilfov, Romania",
    County = "Ilfov",
    Mnemonic = "IF",
    CountySeat = "Bucuresti",
    Latitude = 44.5355,
    Longitude = 26.2324
  )

data = rbind(data, ilfov_data)

rm(ilfov_data)

romania_counties = append(romania_counties, "Ilfov")
romania_county_seats = append(romania_county_seats, "Bucuresti") 
romania_mnemonics = append(romania_mnemonics, "IF")

2.5. Categorizing Time into Seasons

For some of our plots we detected a need to compare different seasons of the year, so we decided to create a new column specifically for this scope.

compute_season_from_date = function(date_col) {
  month_col = substr(date_col, 1, 2)

  seasons_col = map(month_col, function(month) {
    if (month %in% c("03", "04", "05")) {
      return("Spring")
    }
    
    if (month %in% c("06", "07", "08")) {
      return("Summer")
    }
    
    if (month %in% c("09", "10", "11")) {
      return("Autumn")
    }
    
    return("Winter")
  })
  
  return(as.character(seasons_col))
}

data = data %>%
  mutate(Season = compute_season_from_date(Datetime))

3. Breaking down the Weather

3.1. Exploring the Weather Types

First we define the data set portion that we are going to use in this analysis.

# Select relevant columns for this analysis.
data_wt_cond = data %>% 
  select(
    "Season", "County", "Mnemonic", "Latitude", "Longitude",
     "Mist":"HeavyFreezingRain",
     "ConditionPartiallyCloudy":"ConditionRain"
    )
wt_cols = 6:42
cond_cols = 43:46

3.1.1. Unusual Weather Types

There is a wide variety of weather types that can be identified in Romania during the years’ four seasons. We have identified 37 special ones, as labeled by the Visual Crossing Weather API, but what is the most common unusual weather in Romania? We could first take a look at the exact number of occurrences for each weather type.

# Create data frame for the first plot
data_wt_cond_1 = data_wt_cond %>%
  summarise(across(all_of(wt_cols), sum, na.rm = TRUE)) %>%
  gather(key = "WeatherType", value = "Occurrences") %>%
  arrange(Occurrences) %>%
  mutate(WeatherType = factor(WeatherType, WeatherType))

total_occurrences = (data_wt_cond_1 %>% summarise(sum(Occurrences)))[[1]]

data_wt_cond_1 = data_wt_cond_1 %>%
  mutate(OccurrencesFreq = round(Occurrences / total_occurrences * 100, 3))

# Define text label function
get_occ_text = function(type_col, occ_col, occ_freq_col) {
  texts = list()
  
  for (i in 1:length(occ_col)) {
    text = paste0(
      "Weather: ", type_col[i], "\n",
      occ_freq_col[i], "% of Unusual Weather\n",
      occ_col[i], " observed occurrences"
    )
    texts = append(texts, text)
  }
  return(texts)
}

plot_wt_cond_1 = data_wt_cond_1 %>%
  ggplot(aes(
    x = WeatherType,
    y = OccurrencesFreq,
    text = get_occ_text(WeatherType, Occurrences, OccurrencesFreq)
  )) +
  geom_segment(
    aes(x = WeatherType, xend = WeatherType, y = 0, yend = OccurrencesFreq),
    color = "#82a3e0",
    alpha = 1
  ) +
  geom_point(color="#82a3e0", size=2, alpha=1) +
  coord_flip() +
  labs(
    title = "Frequency of Unusual Weather Conditions in Romania",
    x = "",
    y = "% of unusual weather occurrences"
  )

rm(data_wt_cond_1, total_occurrences)

ggplotly(plot_wt_cond_1, tooltip = "text")

The lollipop plot from above tells us that the Mist weather has the biggest amount of occurrences during 2011-2021 when summing up all the Romania’s counties. This might be explained by the counties found in the Carpathian Mountains where this type of weather is usually more manifested, together with Rain, Fog, and Snow.

In order to confirm this hypothesis, we decided to plot the most unusual weather for each county and check the results in the following map.

library(geojsonio)
## Registered S3 method overwritten by 'geojsonsf':
##   method        from   
##   print.geojson geojson
## 
## Attaching package: 'geojsonio'
## The following object is masked from 'package:base':
## 
##     pretty
library(broom)

romania_df = geojson_read("romania.geojson", what = "sp")
romania_map = tidy(romania_df, region = "name")

data_wt_cond_2 = data_wt_cond %>%
  select(c(County, all_of(wt_cols))) %>%
  group_by(County) %>%
  summarise(across(1:37, sum, na.rm = TRUE)) %>%
  gather(key = "WeatherType", value = "Occurrences", -County) %>%
  group_by(County) %>%
  filter(Occurrences == max(Occurrences)) %>%
  mutate(WeatherType = factor(WeatherType, WeatherType)) %>%
  select(-Occurrences) %>%
  ungroup()

map_wt_cond_2 = romania_map %>%
  left_join(. , data_wt_cond_2, by = c("id" = "County"))

plot_wt_cond_2 = ggplot() +
  geom_polygon(
    data = map_wt_cond_2,
    aes(fill = WeatherType, x = long, y = lat, group = group),
    color="#dbdbdb"
  ) +
  coord_map() +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_manual(
    values = c("#b1b3c7", "#848bc4", "#888a99", "#343873"),
    labels = c("Mist", "Sky Coverage Increasing", "Fog", "Rain")
  ) +
  labs(
    title = "Most Unusual Weather Conditions in Romania",
    fill = "Weather Type"
  )

rm(data_wt_cond_2, map_wt_cond_2)

plot_wt_cond_2

As it is the case here, it seems that the Mist weather is encountered mostly in the southern and eastern parts of the country, while the Oriental Carpathians present a Fog weather, with mostly Rainy days in Mures and Sibiu, inside the Transylvanian Plateau. Note that Mist is also a type of Fog, but one that contains less condensed water drops.

3.1.2. Other Weather Conditions

Here we look into the 4 conditions the data set has to offer related to the Conditions column. In the pie chart below you can see the overall distribution of the weather conditions in the entire period among all the Romania’s counties, and it looks like the most part of the weather is Partially Cloudy.

data_wt_cond_3 = data_wt_cond %>%
  summarise(across(all_of(cond_cols), sum, na.rm = TRUE)) %>%
  gather(key = "Condition", value = "Occurrences") %>%
  arrange(desc(Occurrences)) %>%
  mutate(Condition = factor(Condition, Condition)) %>%
  mutate(ypos = cumsum(Occurrences) - 0.5 * Occurrences )

total_occurrences = (data_wt_cond_3 %>% summarise(sum(Occurrences)))[[1]]

data_wt_cond_3 = data_wt_cond_3 %>%
  mutate(OccurrencesFreq = round(Occurrences / total_occurrences * 100, 2))

plot_wt_cond_3 = data_wt_cond_3 %>%
  ggplot(aes(x = "", y = Occurrences, fill = Condition)) +
  geom_bar(stat = "identity", width = 1, color="white") +
  coord_polar("y", start = 0) +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(title = "Distribution of registered Sky Conditions") +
  scale_fill_manual(
    values = c("#0c9fc7", "#041b8f", "#02d7e3", "#363e66"),
    labels = c("Partially Cloudy", "Rain", "Clear", "Overcast")
  ) +
  geom_text(
    aes(y = total_occurrences - ypos, label = paste0(OccurrencesFreq, "%")),
    color = "white",
    size = 6
  )

rm(data_wt_cond_3)

plot_wt_cond_3

The next part would be to plot these conditions on the Romania’s map, however, we decided to take a step further, and plot their distributions among the four different seasons. Down below you can observe the most observed weather condition for each county during spring, summer, autumn, and winter time.

library(leaflet)

romania_df = geojson_read("romania.geojson", what = "sp")

data_wt_cond_4 = data_wt_cond %>%
  select(c(Season, County, Mnemonic, all_of(cond_cols))) %>%
  group_by(Season, County, Mnemonic) %>%
  summarise(across(1:4, sum, na.rm = TRUE)) %>%
  gather(key = "SkyCondition", value = "Occurrences", -c(County, Season, Mnemonic)) %>%
  group_by(Season, County, Mnemonic) %>%
  filter(Occurrences == max(Occurrences)) %>%
  mutate(SkyCondition = factor(SkyCondition, SkyCondition)) %>%
  ungroup()
## `summarise()` has grouped output by 'Season', 'County'. You can override using
## the `.groups` argument.
# Surprisingly, during Winter, Timis has both Rain and Partialy Cloudy as the most
# encountered conditions (both with 140 occurences), so we must pick one to display.
# We chose Partially Cloudy, since it looks like this is the trend in Romania
data_wt_cond_4 = data_wt_cond_4 %>%
  filter(County != "Timis" | Season != "Winter" | SkyCondition != "ConditionRain")

convert_cond_col_to_text = function(cond_col) {
  from_arr = c("ConditionClear", "ConditionPartiallyCloudy", "ConditionOvercast", "ConditionRain")
  to_arr = c("Clear", "Partially Cloudy", "Overcast", "Rain")

  return(plyr::mapvalues(
    x = cond_col,
    from = from_arr,
    to = to_arr
  ))
}

plot_conditions_distribution_for_season = function(season_name, season_palette) {
  data_wt_cond_season_4 = data_wt_cond_4 %>%
    filter(Season == season_name) %>%
    select(County, SkyCondition)

  map_wt_season_4 = romania_df
  
  map_wt_season_4@data = map_wt_season_4@data %>%
    left_join(data_wt_cond_season_4, by = c("name" = "County"))
  
  cond_palette = colorFactor(
    palette = season_palette,
    levels = c("ConditionClear", "ConditionPartiallyCloudy", "ConditionOvercast", "ConditionRain"),
    domain = map_wt_season_4@data$SkyCondition,
    na.color = "transparent"
  )
  
  season_labels <- sprintf(
    "<strong>%s</strong><br/>Condition: <em>%s</em>",
    map_wt_season_4$name,
    convert_cond_col_to_text(map_wt_season_4$SkyCondition)  
  ) %>% lapply(htmltools::HTML)
  
  
  plot = leaflet(
      map_wt_season_4,
      options = leafletOptions(zoomSnap = 0.5, zoomDelta = 0.5)
    ) %>%
    setView(25, 46, 6.5) %>%
    addProviderTiles(providers$OpenStreetMap.HOT) %>%
    addPolygons(
      fillColor = ~cond_palette(SkyCondition),
      opacity = 1,
      color = "#dbdbdb",
      weight = 2,
      fillOpacity = 0.7,
      highlightOptions = highlightOptions(
        weight = 4,
        color = "#6b6b6b",
        fillOpacity = 0.7,
        bringToFront = TRUE
      ),
      label = season_labels,
      labelOptions = labelOptions(
        style = list("font-weight" = "normal", padding = "3px 8px"),
        textsize = "15px",
        direction = "auto"
      )
    ) %>%
    addLegend(
      pal = cond_palette,
      values = ~SkyCondition,
      opacity = 0.7,
      title = paste(season_name, "Conditions"),
      position = "topright",
      labFormat = labelFormat(transform = convert_cond_col_to_text)
    )
  
  return(plot)
}

spring_palette = c("#79f520", "#6aba2d", "#357a18", "#2c6314")
summer_palette = c("#f5f120", "#bab72d", "#7a7818", "#636214")
autumn_palette = c("#f56b20", "#ba5e2d", "#7a3918", "#632c14")
winter_palette = c("#20d9f5", "#2daaba", "#18737a", "#0e3b40")

plot_conditions_distribution_for_season("Spring", spring_palette)
plot_conditions_distribution_for_season("Summer", summer_palette)
plot_conditions_distribution_for_season("Autumn", autumn_palette)
plot_conditions_distribution_for_season("Winter", winter_palette)

We can conclude with a few observations in this section:

  • during spring, only the Rain and Partially Cloudy weather are dominant in Romania;
  • during summer and autumn, there is no county having a dominant Overcast weather;
  • winter is the only season where the Overcast weather is dominant, even withing 10 counties;
  • every season, Romania is dominated by a Partially Cloudy weather;
  • Cluj and Sibiu have presented a dominant Rain weather during all of the seasons.

3.2. Exploring the Temperatures

3.3. Exploring the Winds

3.4. Exploring the Humidity

3.5. Exploring the Precipitations and their effects

4. Making sense of the Weather

4.1. Surface connections

4.2. Showcasing Weather Types by Temperature

4.4. How powerful are Romania’s storms?

5. Conclusions

What to do with this peace of code?

small_data = data %>% group_by(Address) %>% filter(row_number() == 1)
ggplot(small_data, aes(x=Latitude, y=Longitude)) + 
     geom_point(color="orange",
         fill="#69b3a2",
         shape=21,
         alpha=0.5,
         size=6,
         stroke = 2)+geom_text(
     label=rownames(small_data), 
     nudge_x = 0.25, nudge_y = 0.25, 
     check_overlap = T
   )